Reliable Software for Unreliable Hardware - A Cross-Layer Approach
نویسنده
چکیده
xiv 2) The Instruction Error Masking Index estimates the probability that an error at an instruction will ultimately be masked until the final program output, i.e. does not become visible at the application output and therefore is denoted as ‘masked’. 3) In case the error is not masked, the Error Propagation Index estimates how many outputs will be affected by the unmasked error. These instruction-level estimates are then used to obtain the reliability estimates at basic block and function/task levels. In the optimization flow, these models are leveraged to quantify the reliability-wise importance of different instructions, basic blocks, and functions to enable selective reliabilityoptimization at different system layers under tolerable performance overhead constraints. Cross-Layer Software Program Reliability Optimization: This thesis develops concepts and techniques for cross-layer reliability optimization and leverages multiple system layers for reliable composition and execution of application software programs. First multiple versions of a software program are obtained that enable run-time tradeoffs between reliability and performance properties. This is done through the following two means. 1) Different reliability-driven software transformations and instruction scheduling techniques are proposed that lower spatial/temporal vulnerabilities and probabilities of software program failures and Incorrect Outputs by reducing the number of executions of critical instructions (like load, store, branches, jumps, and calls). Applying these transformations in constrained scenarios provides on average 60% lower software program failures (i.e. crashes, halt, hang, abort), and thus increased software reliability. 2) Reliability-driven selective instruction redundancy is proposed that selects a set of reliability-wise important instructions in different functions for redundancy-based protection depending upon the instruction vulnerabilities, instruction-level error masking and propagation, and protection overhead under user-provided tolerable performance overhead constraint. The key is to give more protection to the less-resilient part of the software program and less protection to more-resilient part to achieve a high degree of reliability in constrained scenarios. Compared to state-of-the-art, the proposed selective instruction protection provides 4.84x improved reliability at 50% tolerable performance overhead constraint. Afterwards, multiple reliable versions are exploited by a reliability-driven run-time system that enhances the reliability of multiple concurrently executing applications in a manycore processor, while accounting for the frequency variations and degradation due to design-time process variation and run-time aging induced effects. It performs the following key operations to facilitate reliable software program execution. 1) Adaptively activating and deactivating the redundant multithreading for different applications in a manycore processor in area-constrained scenarios. It accounts for variable resilience properties and deadline requirements of different applications along with a history of the encountered errors. 2) Dynamically selecting an appropriate reliable version for each application considering cores’ frequency variations due to design-time process variations and run-time aging-induced performance degradation. 3) Mapping the selected application version on the cores used for redundant multithreading at run time such that, the execution properties of the redundant threads closely match the frequency properties of allocated cores considering core-to-core frequency variations. Compared to state-of-the-art single-layer reliability optimizing techniques, the proposed cross-layer approach achieves 16%-57% improved software reliability on average for different chip configurations, various process variation maps, and different aging years. In addition to the above-discussed scientific contribution, several tools for gate-level soft error analysis, aging analysis, an integrated fault generation and injection system for instruction set simulators have been developed in the scope of this work and are made available at http://ces.itec.kit.edu/846.php.
منابع مشابه
Reliable Computation on Unreliable Hardware: Can We Have Our Digital Cake and Eat It?
The digital abstraction allowed us to achieve unprecedented scalability in both hardware and software complexity. Yet, circuit level digital abstraction is becoming increasingly expensive to maintain. We show it’s possible to raise the digital abstraction up to software layers and yet provide correctnession up to software layers and yet provide correctness
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملTransactional Encoding for Tolerating Transient Hardware Errors
The decreasing feature size of integrated circuits leads to less reliable hardware with higher likelihood for errors. Without adding additional failure detection and masking mechanisms, the next generations of CPUs would at least be unfit for executing missionand safety-critical applications. One common approach is the replicated execution of programs on redundant cores, which is increasingly d...
متن کاملDesign of energy efficient and dependable health monitoring systems under unreliable nanometer technologies
In this paper we investigate the impact of potential hardware misbehavior induced by reliability issues and scaled voltages in wireless body sensor network (WBSN) nodes. Our study reveals the inherent resilience of popular algorithms in cardiac monitoring applications and argues that by exploiting the unique characteristics of such algorithms the energy efficiency and reliability of such system...
متن کاملReliable Routing In VANET Using Cross Layer Approach
Vehicular Ad hoc Networks (VANETs), a subclass of mobile ad hoc network (MANET), is a promising approach for the intelligent transport system (ITS). Routing in VANET is more complicated then MANET due to unique characteristics like high dynamic nature, predictable mobility, scalability and frequent disconnection. In recent years, A stateless routing turn attention of researchers as it provides ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015